4.2 - Design - Expanding the State and Action Space

To graduate our agent from a single-purpose actor to a decision-maker, we must evolve its environment. This involves expanding its "senses" (the observation space) and its "abilities" (the action space) to provide the necessary tools for learning the more complex task of balancing economic priorities.

This document details the design changes from the WorkerEnv to the new MacroEnv.

Agent Requirements Analysis

The new task requires the agent to:

Sense when its supply capacity is becoming a limiting factor.
Possess the ability to increase that supply capacity.

This mandates a direct evolution of our environment's API.

1. Observation Space Evolution

The agent needs a more complete picture of its supply situation than just supply_left. We will provide the raw components of supply to allow the agent to learn the relationship itself.

Design Comparison:

WorkerEnv Observation Space - Box(3,)

Index Feature
0 Minerals
1 Worker Count
2 Supply Left

Index	Feature
`0`	Minerals
`1`	Worker Count
`2`	Supply Left

MacroEnv Observation Space - Box(4,)

Index	Feature	Rationale for Change
`0`	Minerals	(Unchanged) Required for affordability checks.
`1`	Worker Count	(Unchanged) Required as a progress metric.
`2`	`supply_used`	(New) Provides the "demand" side of the supply equation.
`3`	`supply_cap`	(New) Provides the "supply" side of the equation.

Rationale: Providing supply_used and supply_cap separately is a more robust design. It gives the agent the raw data and allows the neural network to learn the concept of "supply pressure" on its own, which can lead to a more nuanced and effective policy.

2. Action Space Evolution

To meet the new requirements, the agent must be given the ability to build a supply structure.

Design Comparison:

WorkerEnv Action Space - Discrete(2)

Action Meaning
0 Do Nothing
1 Build Worker

Action	Meaning
`0`	Do Nothing
`1`	Build Worker

MacroEnv Action Space - Discrete(3)

Action	Meaning	Rationale for Change
`0`	Do Nothing	(Unchanged) A passive choice is always required.
`1`	Build Worker	(Unchanged) The primary economic action.
`2`	Build Supply	(New) Directly provides the agent with the tool needed to solve the new problem.

These modifications transform the environment from a simple, single-goal task to a more dynamic system where the agent must learn to prioritize actions based on a richer set of inputs.

Agent Requirements Analysis​

1. Observation Space Evolution​

2. Action Space Evolution​

Agent Requirements Analysis

1. Observation Space Evolution

2. Action Space Evolution